A Rough Set Interpretation of User's Web Behavior: A Comparison with Information Theoretic Measure
نویسنده
چکیده
Searching for relevant information on the World Wide Web is often a laborious and frustrating task for casual and experienced users. To help improve searching on the Web based on a better understanding of user characteristics, we address the following research questions: What kind of information would rough set theory shed on user’s web behavior? What kind of rules can we extract from a decision table that summarizes the behavior of users from a set of attributes with multiple values in such a case? What kind of decision rules can be extracted from a decision table using an information theoretic measure? (Yao 2003) compared the results of granularity of decision making systems based on rough sets and information theoretic granulation methods. We concur with Yao, that although the rules extracted from Rough Set(RS) and Information Theoretic(IT) might be equal, yet the interpretation of the decision is richer in the case of RS than in the case of IT. General Introduction to Rough Set Theory and Decision Analysis The rough set approach to data analysis and modeling (Pawlak 1997, 2002) has the following advantages: ais based on the original data and does not need any external information (probability or grade of membership); bIt is a suitable tool for analyzing quantitative and qualitative attributes; c-It provides efficient algorithms for finding hidden patterns in data; d-It finds minimal sets of data (data reduction); e-It evaluates the significance of data. We show that the rough set theory is a useful tool for analysis of decision situations, in particular multi-criteria sorting problems. It deals with vagueness in representation of a decision situation, caused by granularity of the representation. The rough set approach produces a set of decision rules involving a minimum number of most important criteria. It does not correct vagueness manifested in the representation; instead, produced rules are categorized into deterministic and non-deterministic. The set of decision rules explains a decision policy and may be used for decision support. Mathematical decision analysis intends to bring to light those elements of a decision situation that are not evident for actors and may influence their attitude towards the situation. More precisely, the elements revealed by the mathematical decision analysis either explain the situation or prescribe, or simply suggest, some behavior in order to increase the coherence between evolution of the decision process on the one hand and the goals and value system of the actors on the other. A formal framework for discovering facts from representation of a decision situation has been given by (Pawlak 1982) and called rough set theory. Rough set theory assumes the representation in a decision table in which there is a special case of an information system. Rows of this table correspond to objects (actions, alternatives, candidates, patients etc.) and columns correspond to attributes. For each pair (object, attribute) there is a known value called a descriptor. Each row of the table contains descriptors representing information about the corresponding object of a given decision situation. In general, the set of attributes is partitioned into two subsets: condition attributes (criteria, tests, symptoms etc.) and decision attributes (decisions, classifications, taxonomies etc.). As in decision problems the concept of criterion is often used instead of condition attribute; it should be noticed that the latter is more general than the former because the domain (scale) of a criterion has to be ordered according to decreasing or increasing preference while the domain of a condition attribute need not be ordered. Similarly, the domain of a decision attribute may be ordered or not. In the case of a multi-criteria sorting problem, which consists in assignment of each object to an appropriate predefined category (for instance, acceptance, rejection or request for additional information), rough set analysis involves evaluation of the importance of particular criteria: aconstruction of minimal subsets of independent criteria bhaving the same discernment ability as the whole set; c-non-empty intersection of those minimal subsets to give a core of criteria which cannot be eliminated without it; d-disturbing the ability of approximating the decision; eelimination of redundant criteria from the decision table;6the generation of sorting rules (deterministic or not) from the reduced decision table, which explain a decision; fdevelopment of a strategy which may be used for sorting new objects. Rough Set Modeling of User Web Behavior The concept of rough set theory is based on the assumption that every object of the universe of discourse is associated with some information. Objects characterized by the same information are indiscernible in view of their available information. The indiscernibility relation generated in this way is the mathematical basis of rough set theory. The concepts of rough set and fuzzy set are different since they refer to various aspects of non-precision. Rough set analysis can be used in a wide variety of disciplines; wherever large amounts of data are being produced, rough sets can be useful. Some important application areas are medical diagnosis, pharmacology, stock market prediction and financial data analysis, banking, market research, information storage and retrieval systems, pattern recognition (including speech and handwriting recognition), control system design, image processing and many others. Next, we show some basic concepts of rough set theory. 20 Users from Roane State were used to study their web characteristics. The results of the fact based query “Limbic functions of the brain” is summarized in Table 1 (S=1,2;M=3,4;L=5,7;VL=8, 9,10). The notion of a User Modeling System presented here is borrowed from (Pawlak 1991). The formal definition of a User Modeling System (UMS) is represented by S=(U, Ω, V, f) where: U is a nonempty, finite set of users called the universe;Ω is a nonempty, finite set of attributes: CUD, in which C is a finite set of condition attributes and D is a finite set of decision attributes; V= ∪ Vq is a non empty set of values of attributes, and Vq is the domain of q (for each qε Ω); f is a User Modeling Function :
منابع مشابه
Multi-granulation fuzzy probabilistic rough sets and their corresponding three-way decisions over two universes
This article introduces a general framework of multi-granulation fuzzy probabilistic roughsets (MG-FPRSs) models in multi-granulation fuzzy probabilistic approximation space over twouniverses. Four types of MG-FPRSs are established, by the four different conditional probabilitiesof fuzzy event. For different constraints on parameters, we obtain four kinds of each type MG-FPRSs...
متن کاملA fuzzy multigranulation decision-theoretic approach to multi-source fuzzy information systems
Decision-theoretic rough set theory (DTRS) is becoming one of the important research directions for studying set approximations using Bayesian decision procedure and probability theory in rough set community. In this paper, a novel model, fuzzy multigranulation decision-theoretic rough set model (FM-DTRS), is proposed in terms of inclusion measure of fuzzy rough sets in the viewpoint of fuzzy m...
متن کاملDiagnosis of the disease using an ant colony gene selection method based on information gain ratio using fuzzy rough sets
With the advancement of metagenome data mining science has become focused on microarrays. Microarrays are datasets with a large number of genes that are usually irrelevant to the output class; hence, the process of gene selection or feature selection is essential. So, it follows that you can remove redundant genes and increase the speed and accuracy of classification. After applying the gene se...
متن کاملAnalyzing new features of infected web content in detection of malicious web pages
Recent improvements in web standards and technologies enable the attackers to hide and obfuscate infectious codes with new methods and thus escaping the security filters. In this paper, we study the application of machine learning techniques in detecting malicious web pages. In order to detect malicious web pages, we propose and analyze a novel set of features including HTML, JavaScript (jQuery...
متن کاملA hybrid filter-based feature selection method via hesitant fuzzy and rough sets concepts
High dimensional microarray datasets are difficult to classify since they have many features with small number ofinstances and imbalanced distribution of classes. This paper proposes a filter-based feature selection method to improvethe classification performance of microarray datasets by selecting the significant features. Combining the concepts ofrough sets, weighted rough set, fuzzy rough se...
متن کامل